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AMENDMENTS TO THE CLAIMS 

This listing of claims replaces all prior versions and listings of claims in the 
application. 

Please cancel claims 4 and 10 without prejudice. 
In the claims: 

1. (currently amended) A computer-implemented method of identifying table data in a 
document comprising the steps of: 

[[a)]] receiving a page description language representation of the document for 
providing a list of words in the document and position information for the words; and 
[[b)]] automatically identifying table data in the document based on the page 
description language representation of the document and at least one table identifying 
feature, wherein the identifying step includes, 

[[b1)]] dividing the document into one or more pages; 
[[b2)]] dividing each page into a plurality of lines; 
[[b3)]J for each line, clustering the words of the line into one or more word 
clusters , wherein each cluster includes one or more words, each cluste r having a 
horizontal beginning point, horizontal midpoint, and horizontal end point ; 

[[b4)]] automatically identifying table data in the document based on the 
number of word clusters for each line and the alignment of the horizontal 
beginning points, horizontal midpoints, and horizontal end points of word clusters 
between lines. 

2. (cancelled) 

3. (currently amended) The method of Claim 1 wherein the step of automatically 
identifying table data in the document based on the number of word clusters of each 
line and the alignment of the word clusters between lines further comprises: 

[[b4_1 )]] using the word clusters to generate column position information 
wherein the column information includes for each column a horizonta l beginning point. 
horizontal midpoint, and horizontal end point ; and 
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[[b4_2)]] updating the column position information by performing a union 
operation between the column position information of [[the]] a_previous line and the 
column position information of [[the]] a_current line. 

4. (cancelled) 

5. (currently amended) The method of Claim 1 [[4]] wherein receiving a page 
description language representation of the document for providing a list of words in the 
document and position information for the words includes receiving a PDF 
representation of the document, and wherein converting the table data encompassed 
by each table bounding box to a markup language representation includes converting 
the table data encompassed by each table bounding box to a HTML representation. 

6. (cancelled) 

7. (currently amended) A computer-readable medium having stored thereon sequences 
of instructions, said sequences of instructions including instructions which, when 
executed by a processor, cause said processor to perform the steps of: 

[[a)]] receiving a page description language representation of a document for 
providing a list of words in the document and position information for the words; and 
[[b)]] automatically identifying table data in the document based on the page 
description language representation of the document and at least one table identifying 
feature, wherein identifying includes, 

[[b1)]] dividing the document into one or more pages; 
[[b2)]] dividing each page into a plurality of lines; 
[[b3)]] for each line, clustering the words of the line into one or more word 
clusters , wherein each cluster includes one or more w ords, each cluster having a 
horizontal beginning point, horizontal midpoint, and horizont al end point; and 

[[b4)]] automatically identifying table data in the document based on the 
number of word clusters for each line and the alignment of horizontal beginning 
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points, horizontal midpoints, and horizo ntal end points of the word clusters 
between lines. 

8. (cancelled) 

9. (currently amended) The computer-readable medium of Claim 7 further containing 
instructions which, when executed by said processor, would cause said processor to 

perform the steps of: 

[[b4_l)]] using the word clusters to generate column position information 
wherein the column information includes for each column a horizontal beginning point, 
horizontal midpoint, and horizontal end point ; and 

[[b4_2)]] updating the column position information by performing a union 
operation between the column position information of [[the]] a_previous line and the 
column position information of [[the]] a_current line. 

10. (cancelled) 

1 1 . (cancelled) 

12. (currently amended) A document processing system comprising: 

[[a)]] a processor for executing programs; and 

[[b)]] a table identification program for receiving a page description language 
representation of a document, the page description language representation providing a 
list of words in the document and position information for the words, and for 
automatically identifying table data in the document based on the page description 
representation of the document and at least one table identifying feature, wherein the 
identification program is configured to, i ncludes a bound i ng box g o norat i on module for 
roco i ving tho Net or words and for automat i cal l y gonorat i ng a tablo bounding box for 
each tab le i n tho documont bas e d on tho numb e r of work clusters i n o ach l i ne. 

divide the document into one or more pages; 

divide each page into a plurality of lines; 
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for each line, cluster the words of the line into one or more word clusters, 
wherein each cluster includes one or more words, ea ch cluster having a 
horizontal beginning point, horizontal midpoint, and horizontal end point: and 

automatically identify table data in the document based on the number of 
word clusters for each line and the alignment of horiz ontal beginning points, 
horizontal midpoints, and horizontal end points of the word clusters between 
lines. 

13. (cancelled) 

14. (cancelled) 

1 5. (currently amended) The document processing system of claim 1 2 wherein the 
table identification program further comprises: 

[[b3)]] a conversion module coupled to the bounding box generation module for 
receiving the table bounding box for each table in the document, and for converting the 
words encompassed by the table bounding box into a markup language representation 
that maintains the table structure of each table. 

16. (currently amended) The method of claim 1 wherein the step of automatically 
identifying table data in the document based on the page description language 
representation of the document and at least one table identifying feature further 
comprises: 

[[b1)]] automatically identifying table data in the document based on one or more 
table headings. 

17. (currently amended) The method of claim 1 wherein the step of automatically 
identifying table data in the document based on the page description language 
representation of the document and at least one table identifying feature further 
comprises: 

[[b1)]] automatically identifying table data in the document based on one or more 
horizontal lines and vertical lines that separate rows or columns of the table. 
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1 8. (new) The method of claim 1 , wherein the step of automatically identifying table 
data in the document based on the number of word clusters for each line and the 
alignment of the word clusters comprises: 

determining whether the number of word clusters in a line is greater than a 
threshold value; and 

classifying the word clusters in the line as a row of a table in response to the 
number of word clusters in a line being greater than the threshold value. 

1 9. (new) The computer-readable medium of claim 7, wherein the instructions for 
automatically identifying table data in the document based on the number of word 
clusters for each line and the alignment of the word clusters include instructions that 
when executed by a processor cause the processor to perform the steps further 
comprising: 

determining whether the number of word clusters in a line is greater than a 
threshold value; and 

classifying the word clusters in the line as a row of a table in response to the 
number of word clusters in a line being greater than the threshold value. 

20. (new) The document processing system of claim 12, wherein the table 
identification program is further configured to: 

determine whether the number of word clusters in a line is greater than a 
threshold value; and 

classify the word clusters in the line as a row of a table in response to the 
number of word clusters in a line being greater than the threshold value. 
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